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"VIDEO CODING METHOD AND DEVICE" 



PHf=R020093 



FIELD OF THE INVENTION 

The present invention reiates to a three-dimensional (3D) video coding method for 
the compression of a bitstream corresponding to an original video sequence that has been 
divided Into successive groups of frames (GOI=s) the size of which is N = 2" with n being an 
integer, said coding method comprising the following steps, applied to each successive GOF of 
the sequence : 

a) a spatio-temporal analysis step, leading to a spatio-temporal mulOresoluUon 
decomposition of the current GOF Into 2" low and high frequency temporal subbands, said step 
Itself comprising : 

- a motion estimation sub-step ; 

- based on said motion esHmatlon, a motion compensated temporal filtering sub- 
step, pertbmied on each of the 2""^ couples of frames of the current GOF ; 

- a spatial analysis sub-step, pert'ormed on the subbands resulting from said 
temporal filtering sub-step ; 

b) an encoding step, pertbrmed on said low and high frequency temporal subbands 
resulting fiom the spatio-temporal analysis step and on motion vectors obtained by means of 
said motion eshmaBon step and delivering an embedded coded bitstream. 

The invention also relates to a video coding device for carrying out said coding 

method. 

BACKGROUND OF THE INVENTION 

Video streaming over heterogeneous networia requires a high scalability capability. 
That means that parts of a bitstream can be decoded without a complete decoding of the 
sequence and combined to reconstruct the initial video infomiatlon at lower spatial or temporal 
resolutions (spatial/temporal scalability) or with a lower quality (PSNR or bitrate scalability). A 
convenient way to achieve all these three types of scalability (scalable, temporal, PSNR) Is a 
three-dimensional (3D, or 2D + 1) subband decomposition of the input video sequence, after a 
motion compensation of said sequence. 

Cun^nt standards lilce MPEG-4 have In fact implemented limited scalability In a 
predictive DCT-based fi^meworic through additional high-cost layers. However, more efficient 
solutions based on a three-dimensional subband decomposition followed by a hierarchical 
encoding of the spatlo^!emporal trees - performed by means of an encoding module based on 
the technique named Fully Scalable Zerotree (FSZ) - have been recently proposed as an 
e>ctenslon of stall Image coding techniques for video : the 3D or (2D+t) subband decomposition 
provides a natural spatial resolution and fiame rate scalability, while the In-depth scanning of 
the coefficients In the hierarchical trees and the progressive bitplane encoding technique lead to 
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the desired quality scalability. A higher flexibility Is then obtained at a reasonable cost In temis 
of coding effidency. 

The ISOAEC MPEG nomiallzatlon committee launched at the 58* Meeting In 
Pattaya, Thailand, December 3-7, 2001, a dedicate AdHoc Group (AHG on Exploration of 
Interframe Wavelet Technology In Video Cbding) In order to, among other things, explore 
technical approaches for Interframe (e-g. motion-compensated) wavelet coding and analyze in 
terms of maturity, effidency and potential for future optimization. The codec desaibed in the 
document 42GiVEPaiA)4361-(^ 

that shows a temporal subband decomposition with motion compensation. 

This 3D wavelet decomposition with motion compensation Is applied to a GOF, the 
frames being referenced Fl to F8 and organized in successive couples of frames. Each GOF is 
motion-compensated (MC) and temporally filtered (TF), thanks to a Motion Compensated 
Temporal Filtering (MCTF) module. At each temporal decomposition level, resulting low 
frequency temporal subbands are further filtered, and the process stops when there Is only one 
temporal low frequency subband left. This root temporal subband, called LLL in Fig.l where 
three stages of decomposition are shown (L and H = first stage ; LL and LH = second stage ; 
LLL and LLH = third stage), represents a temporal approximation of the Input GOF. Also at eadi 
decomposition level, a group of motion vector fields is generated (In Fig.l, MV4 at the first level, 
MV3 at the second one, MV2 at the third one). 

After these two operations (MC, TF) performed In the MCTF module, the frames of 
the temporal subbands thus obtained are further spatially decomposed and yield a spatio- 
temporal tree of subband coeffidents. With Haar filters us^ for the temporal filtering 
operations, motion estimation (ME) and motion compensation (MC) are only perfomied every 
two frames of the Input sequence, the total number of ME/MC operations required for the whole 
temporal tree being roughly the same as In a predictive scheme. Using these very simple filters, 
the low frequency temporal subband represents a temporal average of the Input couple of 
frames, whereas the high frequency one contains the residual error after the MCTF operation. 

It may then be observed that the whole efficiency of any MC 3D subband video 
coding sdieme depends on the specific efficiency of its MCTF module In compacting the 
temporal energy of the input GOF. Said efficiency itself depends on the motion Information and 
the way in which such information Is processed. For instance, In low motion activity video 
sequences, a strong temporal correlation exists between the input frames, which is no longer 
verified in high motion activity sequences. 

SUMMARY OF THE INVENTION 

It is therefore an object of the invention to propose an encoding method with 
which an Improved coding efficiency Is obtained by taking Into account the above-mentioned 
observation related to the motion activity. 



3 

PHFR020093 

To this end, the Invention relates to a coding method such as defined in the 
Introductory paragraph of the description and which Is moreover characterized In that said 
spatio-temporal analysis step also comprises a decision sul3-step for dynamically choosing the 
Input GOF size, said decision sub-step Itself comprising a motion activity pre-anaiysis operation 
l3ased on the |viPEG-7 (Motion Activity descriptors and performed on the input original frames of 
the first temporal decomposition level to be motion compensated and temporally filtered. 

According to a partlculariy advantageous implementatton, said method is 
characterized In that said decision sub-step, based on the IntBnsityofactMt/^i\Sx\bu\^ of the 
MPEG-7 l^otlon Activity Descriptors for all the frames or subbands of the current temporal 
decomposition level, comprises, for the first temporal decomposition level having a GOF size 
equal to N input original frames, the following operations : 

a) perform ME between each rauple of frames that compose said first level : 

- for each couple : 

- compute the standard deviation of motion vector magnitude ; 

- compute the activity value. 

b) compute the average activity Intensity I(av) : 

- if I(av) is strictly above a specified value, for instance corresponding to a 
medium Intensity, It is decided to reduce the input GOF size by half N and do again the analysis 
on the new GOF thus obtained ; 

- if I(av) is equal to said specified value, it Is decided to keep the current GOF 
size value and perform MCTF on this GOF ; 

- if I(av) is strictly below said specified value, it is decided to Increase the Input 
GOF size by doubling N and do again the analysis on the new GOF thus obtained. 

Sinoe the GOF size selection for the first temporal decomposition level (composed 
of input original frames) is partly based on the ME of these frames, this technical solution leads 
to a low complexity increase of the overall MCTF module, that will however eventually re-use 
this very same motion information for Its own process. Moreover, it must be noted that 
changing from one GOF size to another one does not require a complete re-analysis of the input 
original frames since many motion information are already available. 

It is another object of the invention to propose a coding device for carr/Ing out 
such a coding method. 

To this end, the Invention relates to a video coding device for the compression of a 
bitstream corresponding to an original video sequence that has been divided into successive 
groups of frames (GOFs) the size of which is N = 2" with n being an Integer, said coding device 
comprising the following elements : 

a) spatio-temporal analysis means, applied to each successive GOF of the sequence 
and leading to a spatio-temporal mulOresolution decomposition of the current GOF into 2" low 
and high frequency temporal subbands, said analysis means themselves comprising : 

- a motion estimation circuit ; 
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- based on the result of said motion estimation, a motion compensated temporal 
filtering circuit, applied to each of the 2""^ a)upies of frames of the current GOF ; 

- a spatial analysis circuit, applied to the subbands delivered by said temporal 
filtering circuit ; 

b) encoding means, applied to the low and high frequency temporal subbands 
delivered by said spatio-temporal analysis means and to motion vectors delivered by said 
motion estimation circuit, said encoding means delivering an embedded coded bttstream ; 
said-codlng-deviee-belng-further characterized in-thatsafdspaWo-temporal-analysisTTieansBlso - 
comprise a decision circuit for choosing the input GOF Size, said decision circuit Itself comprising 
a motion activity pre-analysis stage, using the MPEG-7 I^otlon Activity descriptors and applied to 
the input frames of the first temporal decomposition level to be motion a>mpensated and 
temporally filtered. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be described with reference to the accompanying 
drawings in which FIg.l Illustrates a temporal multiresolution analysis with motion 
compensation. 

DETAILED DESCRIPTION OF THE INVENTION 

As said above, the whole efficiency of any I^C 3D subband video coding scheme 
depends on the specific efficiency of its MCTF module In compacting the temporal energy of the 
Input GOF- As the parameter '*GOF size" is a major one for the success of MCTF, it is proposed, 
according to the invention, to derive this parameter from a dynamical Motion Activity pre- 
analysis of the input original frames (the ones that compose the first temporal level) tx> be 
motion-compensated and temporally filtered, using normative (I^PEG-7) motion descriptors (see 
the document "'Overview of the I^PEG-7 Standard, version 6.0'', ISO/IEC 3TC1/SC29/WG11 
N4509, Pattaya, Thailand, December 2001, pp.1-93). The following desaiptlon will define which 
descriptor is used and how it influences tiie choice of the above-mentioned encoding parameter. 

In the 3D video coding scheme described above, ME/MC is generally arbitrarily 
performed on each couple of frames (or subbands) of the current temporal decomposition level. 
It is now proposed, according to the invention, to dynamically choose the input GOF size 
according to the ''intensity of activity" attribute of the I^PEG-7 l^otion Activity Descriptors, and 
this for all the frames of the first temporal decomposition level. In the present example of 
implementation, "intensity of activity" takes its integer values within the [1, 5] range : for 
instance 1 means a "very low intensity" and 5 means "very high intensity". This Activity 
Intensity attribute is obtained by peribrming ME as it would be done anyway in a conventional 
I^CTF scheme and using statistical properties of the motion-vector magnitude thus obtained. 
Quantized standard deviation of motion-vector magnitude is a good metric for the motion 
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Activity Intensity, and Intensity value can be derived from the standard deviation using 
thresholds. The Input GOF size will therefore be obtained as now described : 
for the first temporal decomposition level having a GOF Size equal to N input original frames, 
the following operations are performed : 

a) perform ME between each couple of frames that compose said first level : 

- for each couple : 

- compute the standard deviation of motion vector magnitude ; 

- compute the activity value. 

b) compute the average Activity Intensity I(av) : 

• If I(av) Is strictly above a user-specified value (for Instance 
corresponding to a medium Intensity), it is decided to reduce the Input GOF size by half N and 
do again the analysis on the new GOF thus obtained ; 

- if I(dv) is equal to said specified value, it is decided to keep the current 
GOF size value and perfomi MCTF on this GOF ; 

- if I(av) Is strictly below said specified value, it Is decided to increase the 
Input GOF size by doubling N and do again the analysis on the new GOF thus obtained. 

If the GOF size is doubled, that means that the first half of the new GOF will be 
composed of the already loaded frames and the other half of the following frames, and tiie 
analysis (ME and I(av) computation) will be made only on the newly loaded frames. Otherwise, 
if GOF size Is halved, all the required information needed for the new analysis have been 
already computed and only I(av) must be recomputed for the half-GOF. Therefore, the present 
invention represents a small overall complexity increase in comparison with a conventional 
process In which GOF size is arbitrarily chosen and fixed for the whole sequence. 
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CLAIMS: 

1. A three-dimensional (3D) video coding method for the compression of a bitstream 

corresponding to an original video sequence that has been divided into successive groups of 
frames (GOFs) the size of which Is N = 2" with n being an integer, said coding method 
comprising the following steps, applied to each successive GOF of the sequence: 

a) a spatio-temporal analysis step, leading to a spatio-temporal multlresolutjon 
decomposition of the current GOF into 2" low and high frequency temporal subbands, said step 
itself comprising: 

- a motion estimation sub-step ; 

- based on said motion estimation, a motion compensated temporal filtering sub- 
step, performed on each of the 2""^ ojuples of frames of the current GOF ; 

- a spatial analysis sub-step, performed on the subbands resulting from said 
filtering sub-step ; 

b) an encoding step, performed on said low and high frequency temporal subbands 
resulting from the spatio-temporal analysis step and on motion vectors obtained by means of 
said motion estimation step and delivering an embedded coded bitstream; 

said coding method being further characterized in that said spatio-temporal analysis step also 
comprises a decision sub-step for dynamically choosing the Input GOF size, said decision sub- 
step itself comprising a motion activity pre-analysis operation based on the MPEG-7 Motion 
Activity descriptors and performed on the input original frames of the first temporal 
decomposition level to be motion compensated and temporally filtered. 
2. A coding method according to daim 1, said decision sub-step being based on the 

JntensnyofaaMtyzVaxbyjXB of the MPEG-7 Motion Activity Descriptors for all the frames of the 
first temporal decomposition level and comprising tiie following operations : 

a) perform ME between each couple of frames that compose said first level : 
- for each couple : 

- compute the standard deviation of motion vector magnitude ; 

- compute the Activity value. 

b) compute the average Activity Intensity I(av) : 

- if I(av) is strictly above a user-specified value (for Instance 
corresponding to a medium Intensity), it Is decided to reduce the input GOF size by half N and 
do again the analysis on the new GOF thus obtained ; 

- if I(av) is equal to said specified value, it is decided to keep the 
current GOF size value and perform MCTF on this GOF ; 

- if I(av) Is strictly below said specified value, it Is decided to increase 
the Input GOF size by doubling N and do again the analysis on the new GOF tiius obtained. 

3. A video coding device for the compression of a bitstream corresponding to an 

original video sequence that has been divided into successive groups of frames (GOFs) the size 
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Of which Is N = 2" with n being an integer, said coding device comprising the following 
elements : 

a) spatlo-temporai analysis means, applied to each successive 60F of the sequence 
and leading to a spatio-temporal mulHresolution decomposition of the current GOF into 2" low 
and high frequency temporal subbands, said analysis means themselves comprising : 

- a motion estimation circuit ; 

" based on tiie result of said motion estimation, a motion compensated temporal 
■fllteriDg dreult , appileri t o-each^f-tiie-ayAeouples^figmes-ef-the-eurrent-GeF-; 

- a spatial analysis circuit, applied to tiie subbands delivered by said temporal 
filtering circuit ; 

b) encoding means, applied to tite low and high frequency temporal subbands 
delivered by said spatio-temporal analysis means and to motion vectors delivered by said 
motion estimation circuit, said encoding means delivering an embedded coded bitstream ; 
said coding device being further characterized in that said spatio-temporal analysis means also 
comprise a decision circuit for choosing the input GOF Size, said decision circuit itself comprising 
a motion activity pre-analysis stage, using the MPEG-7 i^otion Activity descriptors and applied to 
tiie input frames of tiie first temporal decomposition level to be motion compensated and 
temporally filtered. 
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The Invention relates to a three-dimensional (30) video coding method for the 
compression of a coded bitstream corresponding to an original video sequence tiiat has been 
divided Into successive groups of frames (GOF%). This method, applied to each GOF of the 
sequence, comprises : (a) a spatio-temporal analysis step, leading to a spatio-temporal 
mulUresolutlon decomposition of the current GOF Into low and high frequency temporal 
subbands and itself comprising a motion estimation sub-step, a motion compensated temporal 
filtering sut>-step and a spatial analysis sub-step, and ; (b) an encoding step, performed on said 
low and high frequency temporal subbands and on motion vectors obtained by means of said 
motion estimation step. According to the Invention, said spatio-temporal analysis step also 
comprises a decision sub-step for dynamically choosing the input GOF size, said decision sub- 
step itself comprising a motion activity pre-analysis operation based on the i^PEG-7 Motion 
Activity descriptors and performed on the input frames of the first temporal decomposition level 
to be motion compensated and temporally filtered. 
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